Sample Code - Splunk
Splunk can be used a variety of ways to measure podcasts. Counting Downloads Basic one used by NPR A daily summary index is run storing the number of downloads for each program each day. Summary Index index: summary_podcasts_by_day_program named: si_podcasts_by_day_program sourcetype=podcast Method=GET (Format=mp3 OR Format=mp4 OR Format=m4v) Bytes > 200000 eventtype!=is_bot NOT test=true | rex field=ByteRange "(?^0-9*)" | eval ByteRangeStart=if(isnull(ByteRangeStart), 0, ByteRangeStart) | eval ObjectSize=if(isnull(ObjectSize), 1000000000000, ObjectSize) | search ByteRangeStart < ObjectSize (Status=200 OR ((Status=000 OR Status=206) AND (ByteRange="0-*" OR ByteRange="-"))) | lookup BannedIPs IPAddress | where isnull(BannedDate) | eval UniqueDownloader=IPAddress . UserAgent | stats count by ProgramID UniqueDownloader Filename date_mday date_month date_year | sistats count by ProgramID This summary index is then queried to view the results. index=summary_podcasts_by_day_program search_name=si_podcasts_by_day_program earliest=-13mon@mon | timechart span=1mon count Counting Unique Downloaders Counting Downloaded Hours At least for now, there is no way to tell how much of a given episode a downloader listened to. In this way podcast sponsorship is much like print advertising. Therefore, it's good practice to avoid using language such as “This episode was heard 354 times” but rather “This episode was downloaded 354 times.” However, it may be useful to report the duration of files downloaded to, for example, weight hour-long episodes more than five minute shorts. We can’t report how much was listened to, but we can gauge how much was downloaded. When reporting hours, the fraction of the file downloaded can be multiplied times the duration of the file to get the time downloaded for that download. Getting the Duration of the File Ideally, metadata is available giving the exact (to the second) duration of the episode. When this metadata is not available, an approximation can be used that uses the size of the file and a conversion factor to estimate the duration of the file. To get the conversion factor, a random selection of episodes can be examined, taking their duration in seconds and dividing by their size in bytes. The average of these can be used as the conversion factor. Where heterogeneity in factors is expected (between talk and music or for different bit rates, for example) these conversion factors can be calculated for each relevant set of files. If the number of bytes for the entire file is not available, the number of bytes downloaded can be used. These should come from lines in the logs that have not been filtered because the byte range is greater than the file size (see Counting Downloads, above.) Cases Depending on Data Available As long as the bytes of the request are available, there are three pieces of information -- duration of the file in seconds, file size, and a bytes per second factor for converting file sizes to durations -- that may or may not be available. Their availability affects how the duration calculation can be done. The bytes per second is only relevant where the duration of the file is not available and the size of the file is available, but its use is made clear in the following matrices. Suppose the duration of the file is available. Then hours can be calculated as follows: Suppose the duration of the file is not available. Then hours can be calculated as follows: There are five cases specified above: # duration and size of file are available: fraction of file downloaded times duration is used #duration of file is available but size is not: we assume the entire file was downloaded, and the duration of the file is used #duration is not available but bytes per second is available: the bytes downloaded are converted to seconds directly using the bytes per second conversion factor #neither duration nor bytes per second are available: the largest of the bytes per second values in the system is used, which gives the lowest estimate of hours #neither duration, bytes per second, or file size are available: no calculation can be made